Index Options
The options page is used to designate additional information about
the indexing operation:
General Options
- Treat Zip Files as directories - check
this option if you want files inside Zip archives to be indexed. These
files can then be searched and viewed just like any other files. Note
that you should not add the zip files themselves to the include
files list.
- Include Word Counts - check this
box if you want Wilbur to keep track of how many times each word appears
in each file. This value is displayed as one of the columns in the
file list pane and the results can be sorted
by this ranking. To save space, only the first 256 occurrences are
counted (one byte’s worth). If the size of the index is an issue,
you can clear this box to save some space.
- Track All Files - when this
box is checked Wilbur will index information on all files in all directories
it visits, not just the files in the include list. Files not
in the include list won't have their contents indexed, but file name,
file folder, size, date modified and attribute information will be
indexed and can be searched for.
Wilbur will not include file information for folders that are not
in either the include list or one of their subdirectories. If
you wanted to include all files on your machine, but did not want
to index the contents of the files, you could use a fake include like
c:\*.xxx to force Wilbur into all folders. Of course if you
are already using something like c:\*.doc this would not be necessary.
- Minimum Word Length - this is the smallest
word that Wilbur will index. The default value is three, but could
be increased to cut down on the number of nonsense words that Wilbur
indexes and hence reduce the size of the index. Of course this would
mean that searches for things like IBM would no longer be possible.
You can also make this value smaller, but risk including a lot of
inappropriate stuff when indexing binary files such as word processing
documents.
- Maximum Word Length - this is the largest
word that Wilbur will index. Like the minimum, this can be modified
as appropriate for the material you are indexing. For example programmers
indexing source code would probably want fairly large values since
variable and routine names can often be quite long. A value of
zero has a special meaning. It causes Wilbur to use a value of
100 characters on material that appears to be pure text and a value
of 20 for files which appear to contain binary information. This was
the behaviour of Wilbur versions prior to 1.5.
Additional Characters to Index
For more control over the characters considered significant the following
options are provided:
Numbers - the options available are:
- Trailing numbers only - the number characters can actually
be anywhere in the word as long as the word is not started by numbers.
- All numbers - number characters are just as significant as
alphabetic characters. Of course in some material this will greatly
increase the number of unique words indexed.
Other Characters
You can explicitly specify characters that are to be considered valid.
If your language is not among the few listed above, just enter the additional
characters required here.
Characters placed in the ‘Others anywhere’ box are valid
anywhere in a word while characters in the ‘Others not starting
word’ box are not valid as the first character in a word.
For instance someone who wanted to search for the term C++ in resume
files could accomplish this by placing a single plus sign in the ‘Others
not starting word’ box. Obviously you would not want to do this
if you were indexing program source code since the plus sign would often
be the terminator for a variable name.
Note that if you include characters such as * or ? which have special
meaning in searches, they will lose their special meaning and be treated
just like other characters.
|